1 Introduction

filled out by Daniel



2 Methodology

partly filled out by Daniel;


Data preparation

Searches with volume 0 are removed from the data set. For AHREF analysis, only searches with volume > 100 are looked at. Analyses are based on US location.



Data enrichment

~2.5 million searches were enriched with AHREF. This includes the statistics difficulty, return rate, clicks, click per search (cps), region volume, and SERP features.



Overview of the data

Overview
Statistic Value
Total number of searches ~306 million
Total volume of searches ~303 billion
Searches with missing volume 0.51%
Mean search volume 989
Median search volume 10
Mean CPC 0.61



3 Research Findings


Questions in searches

~14% of searches are in the form of a question. “how” is the most common question word



Stopwords

“how” and “the” are the most common stopwords, which are present in 6-8% of searches.



Keyword length

The most searched queries have length 6-9 characters, and falls continuously for search queries longer or shorter than that.



Keyword info categories

Internet & Telecom is the keyword category with the highest mean volume

Arts & Entertainment, Internet & Telecom, and News, Media & Publications have the highest total volume

Finance has the highest mean cost per click



Keyword difficulty

As volume increases, the difficulty increases.

From linear regression we find that for each doubling of the volume, the difficulty increases by 1.63


Difficulty and CPC are also correlated:



spell types

Most of the searches with the highest volumes are attempts to go to a popular website.

Searches with highest volume
keyword location spell spell_type keyword_info_search_volume
jou tube 2840 youtube showing_results_for 1.85e+08
youtube the 2840 1.85e+08
youi tue 2840 youtube showing_results_for 1.85e+08
acerook 2840 1.85e+08
youetube 2840 1.85e+08
you tbut 2840 youtube did_you_mean 1.85e+08
ykutube 2840 youtube showing_results_for 1.85e+08
uotod 2840 youtube showing_results_for 1.85e+08
utuen 2840 youtube did_you_mean 1.85e+08
ytu tube 2840 1.85e+08


As a result about half of all volume has a spell type. Although only ~1.4% of searches have a spell type.

The most popular websites that are redirected from misspelling:

Top 10 intended searches that are misspelled
spell volume
youtube 35.3%
facebook 8.7%
amazon 7.6%
google 6.3%
weather 2.2%
translate 1.6%
com 1.5%
instagram 1.3%
walmart 1.3%
ebay 1.2%



Search volume

The top 2000 searches have extremely high volume, while the vast majority of the rest of the searches is very low volume.

Note that many of these extremely high volume searches are not a search for something as such, but an attempt to go to one of the popular sites above.



SERP features

(Note there are (at least) two additional SERP feature types (Knowledge Panel and Videos), for which the sample size is too small to include.)

The SERP features featured in the most searches are Image pack and People also ask:

The knowledge card has a huge effect in reducing the clicks-per-search, while the other SERP features have limited effect. Searches with the Shopping results SERP feature have higher cps on average.

Low difficulty keywords have fewer SERP features

Thumbnail & Top stories is the most common SERP feature pairing

Searches without SERP features tend to be low volume



Return rate

We can see that searches with high return rates tend to have lower difficulty, and to be clicked on a lot more.

Comparison of searches with same volume but different return rates
return_rate mean_cpc mean_clicks mean_difficulty
very high 0.96 71423 18.4
low 0.70 15094 25.6

`